AITopics

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Vision (0.46)

Neural Information Processing SystemsNov-19-2025, 20:08:31 GMT

Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval Shijie Wang

Though important, attribute modeling usually requires significant manual annotations and thus is labor-intensive.

artificial intelligence, machine learning, retrieval model, (15 more...)

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(9 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Neural Information Processing SystemsOct-9-2025, 06:21:01 GMT

Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships

The relational representations relied upon by such methods, however, are abstract.

artificial intelligence, machine learning, natural language, (21 more...)

Country:

North America > United States > California (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Straka, Jakub, Gruber, Ivan

SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing

arXiv.org Artificial IntelligenceSep-1-2025

Self-supervised learning has emerged as a powerful tool for remote sensing, where large amounts of unlabeled data are available. In this work, we investigate the use of DINO, a contrastive self-supervised method, for pretraining on remote sensing imagery. W e introduce SatDINO, a model tailored for representation learning in satellite imagery. Through extensive experiments on multiple datasets in multiple testing setups, we demonstrate that SatDINO outperforms other state-of-the-art methods based on much more common masked autoencoders (MAE) and achieves competitive results in multiple benchmarks. W e also provide a rigorous ablation study evaluating SatDINO's individual components. Finally, we propose a few novel enhancements, such as a new way to incorporate ground sample distance (GSD) encoding and adaptive view sampling. These enhancements can be used independently on our SatDINO model. Our code and trained models are available at: https://github.com/strakaj/

artificial intelligence, dataset, machine learning, (18 more...)

2508.21402

Country: Europe (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Neural Information Processing SystemsAug-18-2025, 22:01:01 GMT

Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

We also experimentally validate our theory on fine-grained dis-tinguishability and obtain consistent results across multiple benchmarks.

artificial intelligence, information, machine learning, (16 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Weber, Manuel, Beneke, Carly

PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data

arXiv.org Artificial IntelligenceApr-29-2025

A BSTRACT We propose PyViT -FUSE, a foundation model for earth observation data explicitly designed to handle multi-modal imagery by learning to fuse an arbitrary number of mixed-resolution input bands into a single representation through an attention mechanism. The learned patch tokens are further processed by a stack of vision transformers with a novel pyramidal structure. We train the model on a globally sampled dataset in a self-supervised manner, leveraging core concepts of the SwA V algorithm. We show the interpretability of the fusion mechanism by visualization of the attention scores and the models applicability to downstream tasks. 1 I NTRODUCTION Foundation models (FM) for earth observations (EO) have gained traction following the success of large language models (LLM) and their demonstration of scaling laws (Kaplan et al., 2020). The premise is that training larger models on vast datasets enhances performance. This idea has been central to computer vision, where datasets like ImageNet (Deng et al., 2009) have enabled pre-training in both supervised and unsupervised settings, leading to breakthroughs in model design and training.

large language model, machine learning, natural language, (17 more...)

2504.1877

Country: North America > United States (0.47)

Genre: Research Report (0.66)

Industry: Energy > Renewable > Solar (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Belal, Yacine, Maouche, Mohamed, Mokhtar, Sonia Ben, Simonet-Boulogne, Anthony

GRANITE : a Byzantine-Resilient Dynamic Gossip Learning Framework

arXiv.org Artificial IntelligenceApr-25-2025

Gossip Learning (GL) is a decentralized learning paradigm where users iteratively exchange and aggregate models with a small set of neighboring peers. Recent GL approaches rely on dynamic communication graphs built and maintained using Random Peer Sampling (RPS) protocols. Thanks to graph dynamics, GL can achieve fast convergence even over extremely sparse topologies. However, the robustness of GL over dy- namic graphs to Byzantine (model poisoning) attacks remains unaddressed especially when Byzantine nodes attack the RPS protocol to scale up model poisoning. We address this issue by introducing GRANITE, a framework for robust learning over sparse, dynamic graphs in the presence of a fraction of Byzantine nodes. GRANITE relies on two key components (i) a History-aware Byzantine-resilient Peer Sampling protocol (HaPS), which tracks previously encountered identifiers to reduce adversarial influence over time, and (ii) an Adaptive Probabilistic Threshold (APT), which leverages an estimate of Byzantine presence to set aggregation thresholds with formal guarantees. Empirical results confirm that GRANITE maintains convergence with up to 30% Byzantine nodes, improves learning speed via adaptive filtering of poisoned models and obtains these results in up to 9 times sparser graphs than dictated by current theory.

artificial intelligence, machine learning, node, (16 more...)

2504.17471

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Kim, Dongseob, Shim, Hyunjung

Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification

arXiv.org Artificial IntelligenceMar-21-2025

Multi-label classification is crucial for comprehensive image understanding, yet acquiring accurate annotations is challenging and costly. To address this, a recent study suggests exploiting unsupervised multi-label classification leveraging CLIP, a powerful vision-language model. Despite CLIP's proficiency, it suffers from view-dependent predictions and inherent bias, limiting its effectiveness. We propose a novel method that addresses these issues by leveraging multiple views near target objects, guided by Class Activation Mapping (CAM) of the classifier, and debiasing pseudo-labels derived from CLIP predictions. Our Classifier-guided CLIP Distillation (CCD) enables selecting multiple local views without extra labels and debiasing predictions to enhance classification performance. Experimental results validate our method's superiority over existing techniques across diverse datasets. The code is available at https://github.com/k0u-id/CCD.

artificial intelligence, image understanding, machine learning, (16 more...)